chore: mm epd disagg #4151

ayushag-nv · 2025-11-06T05:54:22Z

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

Release Notes

New Features
- Added support for multimodal decode worker configuration, enabling disaggregated multimodal serving with independent decode components.
- Introduced new launch script for orchestrated deployment of multimodal models across separate frontend, processor, encoding, prefill, and decode workers.
Chores
- Removed legacy multimodal deployment script.

Signed-off-by: ayushag <[email protected]>

copy-pr-bot · 2025-11-06T05:54:25Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: ayushag <[email protected]>

krishung5

Lgtm! Please remove all the debug logs and the old disagg_multimodal.sh file in the example/multimodal folder before merging.

examples/backends/vllm/launch/disagg_multimodal.sh

Signed-off-by: ayushag <[email protected]>

coderabbitai · 2025-11-07T22:24:01Z

Walkthrough

Added multimodal decode worker support to the vLLM configuration system, updated handler initialization logic to route between decode and prefill+worker handlers based on configuration, introduced a new disaggregated multimodal serving script, and removed an obsolete orchestration script.

Changes

Cohort / File(s)	Summary
Configuration & Arguments `components/src/dynamo/vllm/args.py`	Added `multimodal_decode_worker` flag to Config class; introduced `--multimodal-decode-worker` CLI argument; integrated flag into multimodal aggregation; updated error messaging; added conditional logic to set component to "decoder" and endpoint to "generate" for decode worker; adjusted prefill+worker routing to use "backend" component.
Worker Initialization & Handler Routing `components/src/dynamo/vllm/main.py`	Expanded multimodal worker initialization to select between `MultimodalDecodeWorkerHandler` and `MultimodalPDWorkerHandler` based on `config.multimodal_decode_worker`; added decode worker client creation and connection for disaggregated mode; updated `MultimodalPDWorkerHandler` signature to accept `decode_worker_client` instead of `downstream_client`; extended multimodal component initialization conditions to include decode worker flag.
Orchestration Scripts `examples/backends/vllm/launch/disagg_multimodal.sh`	New Bash script for disaggregated multimodal serving; parses `--model` and `--prompt-template` CLI options; selects template defaults for llava-1.5-7b-hf, Phi-3.5-vision-instruct, and Qwen2.5-VL-7B-Instruct; launches frontend, processor, and three workers (encode, prefill, decode) on distinct GPUs; applies model-specific GPU memory tuning.
Legacy Scripts `examples/multimodal/launch/disagg.sh`	Removed orchestration script that launched multimodal workflow with Ingress, processor, and worker services.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Key areas requiring attention:

Verify handler selection logic correctly routes between MultimodalDecodeWorkerHandler and MultimodalPDWorkerHandler based on configuration
Review decode worker client initialization and connection flow in disaggregated mode, especially component and endpoint routing changes
Validate CLI argument parsing and Config propagation for multimodal_decode_worker flag
Confirm GPU memory tuning parameters and process launch ordering in the new disaggregated multimodal script are correct for target models

Poem

🐰 A decode worker hops into view,
New flags and handlers, so fresh and new!
Disaggregated dreams split across GPUs bright,
While old scripts retire—farewell to the night!
Multimodal magic, now routing with care,
Through prefill and decode, everywhere! 🌟

Pre-merge checks

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is entirely empty template placeholders with no actual content, making it impossible to understand the purpose or scope of changes.	Fill in all sections with concrete details: describe the multimodal disaggregated serving changes, specify files to review, and link the related GitHub issue.
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title 'chore: mm epd disagg' is vague and uses unclear abbreviations that don't clearly convey what changes were made.	Expand abbreviations and be more specific about the change: e.g., 'Add disaggregated multimodal serving configuration with decode worker support' or similar.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

examples/backends/vllm/launch/disagg_multimodal.sh (2)
73-104: Consider adding health checks between component launches.

The script launches all components in rapid succession without waiting for dependencies to be ready. For example, the encode worker (line 89) may start before the processor (line 79) is ready to receive connections.

Consider one of these approaches:

Option 1: Add sleep delays between launches
 # Start processor
 echo "Starting processor..."
 python -m dynamo.vllm --multimodal-processor --model $MODEL_NAME --mm-prompt-template "$PROMPT_TEMPLATE" &
+sleep 5
 
 # Configure GPU memory optimization for specific models
Option 2: Add health check polling (more robust)

After each component launch, add a function to poll its health endpoint:
wait_for_service() {
    local port=$1
    local max_attempts=30
    for i in $(seq 1 $max_attempts); do
        if curl -s "http://localhost:$port/health" > /dev/null 2>&1; then
            return 0
        fi
        sleep 1
    done
    echo "Service on port $port failed to start"
    exit 1
}
87-97: Document or validate GPU availability.

The script assumes GPUs 1, 2, and 3 are available but doesn't validate this. Consider adding a GPU count check at the start or documenting the minimum GPU requirement.

Add GPU validation:
# After line 71, before starting components
GPU_COUNT=$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)
if [ $GPU_COUNT -lt 4 ]; then
    echo "Error: This script requires at least 4 GPUs, but only $GPU_COUNT found"
    exit 1
fi
Or document the requirement in the help text:
 echo "Disaggregated multimodal serving with separate Encode/Prefill/Decode workers"
 echo ""
+echo "Requirements: At least 4 NVIDIA GPUs"
+echo ""
 echo "Options:"

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6336eea and ecda34b.

📒 Files selected for processing (4)

components/src/dynamo/vllm/args.py (5 hunks)
components/src/dynamo/vllm/main.py (3 hunks)
examples/backends/vllm/launch/disagg_multimodal.sh (1 hunks)
examples/multimodal/launch/disagg.sh (0 hunks)

💤 Files with no reviewable changes (1)

examples/multimodal/launch/disagg.sh

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-10-28T04:09:48.264Z

Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_handlers/processor_handler.py:66-72
Timestamp: 2025-10-28T04:09:48.264Z
Learning: In components/src/dynamo/vllm/multimodal_handlers/processor_handler.py, the AutoTokenizer.from_pretrained call with trust_remote_code=True is intentional and expected for the vLLM multimodal handler implementation.

Applied to files:

components/src/dynamo/vllm/main.py

📚 Learning: 2025-06-05T01:04:24.775Z

Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 1392
File: launch/dynamo-run/src/subprocess/vllm_v1_inc.py:71-71
Timestamp: 2025-06-05T01:04:24.775Z
Learning: The `create_endpoint` method in `WorkerMetricsPublisher` has backward compatibility maintained through pyo3 signature annotation `#[pyo3(signature = (component, dp_rank = None))]`, making the `dp_rank` parameter optional with a default value of `None`.

Applied to files:

components/src/dynamo/vllm/main.py

🧬 Code graph analysis (1)

components/src/dynamo/vllm/main.py (3)

components/src/dynamo/vllm/multimodal_handlers/worker_handler.py (2)

MultimodalDecodeWorkerHandler (25-81)

MultimodalPDWorkerHandler (84-260)

lib/bindings/python/src/dynamo/_core.pyi (5)

namespace (42-46)

component (88-92)

endpoint (117-121)

client (154-158)

wait_for_instances (193-200)

lib/bindings/python/rust/lib.rs (5)

namespace (491-496)

component (815-821)

endpoint (703-709)

client (785-799)

wait_for_instances (840-849)

🔇 Additional comments (9)

components/src/dynamo/vllm/args.py (3)

72-72: LGTM! Consistent flag implementation.

The multimodal_decode_worker flag is properly integrated across class attributes, CLI arguments, exclusivity checks, error messaging, and config propagation.

Also applies to: 174-178, 227-227, 232-232, 274-274

245-249: LGTM! Clear component routing for decode worker.

The decode worker correctly uses "decoder" as the component name to enable prefill worker connections in disaggregated mode.

250-253: LGTM! Critical routing for multimodal prefill worker.

The multimodal prefill worker correctly uses "backend" as the component name to maintain the encoder→backend connection, which differs from the standard prefill worker component naming. The comment clearly explains this design decision.

examples/backends/vllm/launch/disagg_multimodal.sh (3)

1-6: LGTM! Proper script initialization.

Good use of set -e for early exit on errors and trap for cleanup of background processes.

12-47: LGTM! Clear CLI interface.

The command-line argument parsing is well-structured with helpful usage information and examples.

49-63: LGTM! Sensible template defaults.

Model-specific prompt templates are properly defined with a clear fallback mechanism for unsupported models.

components/src/dynamo/vllm/main.py (3)

32-32: LGTM! Clean import addition.

109-113: LGTM! Correct routing for decode worker.

The condition appropriately routes multimodal_decode_worker to init_multimodal_worker, where handler selection occurs based on the specific worker type.

639-660: LGTM! Correct handler selection and client wiring.

The logic correctly distinguishes between:

Decode worker (multimodal_decode_worker=True): Uses MultimodalDecodeWorkerHandler without needing a downstream client

Prefill worker (is_prefill_worker=True): Creates decode_worker_client and passes it to MultimodalPDWorkerHandler for disaggregated mode

The handler signatures match the constructor definitions, and wait_for_instances() is properly called before use (line 649).

chore: mm epd disagg

9e4c9d3

Signed-off-by: ayushag <[email protected]>

ayushag-nv requested review from a team as code owners November 6, 2025 05:54

pull-request-size bot added the size/L label Nov 6, 2025

ayushag-nv marked this pull request as draft November 6, 2025 05:54

github-actions bot added the chore label Nov 6, 2025

rmccorm4 added backend::vllm Relates to the vllm backend multimodal labels Nov 6, 2025

chore: debug logs

4a5a79d

Signed-off-by: ayushag <[email protected]>

krishung5 approved these changes Nov 7, 2025

View reviewed changes

examples/backends/vllm/launch/disagg_multimodal.sh Outdated Show resolved Hide resolved

chore: cleanup

ecda34b

Signed-off-by: ayushag <[email protected]>

ayushag-nv marked this pull request as ready for review November 7, 2025 22:00

coderabbitai bot reviewed Nov 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: mm epd disagg #4151

chore: mm epd disagg #4151

ayushag-nv commented Nov 6, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Nov 6, 2025

Uh oh!

krishung5 left a comment

Uh oh!

Uh oh!

coderabbitai bot commented Nov 7, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chore: mm epd disagg #4151

Are you sure you want to change the base?

chore: mm epd disagg #4151

Conversation

ayushag-nv commented Nov 6, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Nov 6, 2025

Uh oh!

krishung5 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot commented Nov 7, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ayushag-nv commented Nov 6, 2025 •

edited by coderabbitai bot

Loading